🎓 Student Performance ML Analysis Report

Comprehensive Machine Learning Analysis of Student Academic Performance

Dataset: 9000 students | 10 original features | 7 Grade categories

9000
Total Students
13
Models Trained
100.0%
Best Accuracy
Decision Tree
🏆 Best Model

📋 1. Dataset Overview

The dataset contains 9000 student records from Martin Luther School with scores in Math, Physics, and Chemistry (range: 10–100).

Grade Distribution

GradeCountPercentageDescription
A+490.5%Excellent Performance
A3604.0%Very Good Achievement
B+101411.3%Good Pursuance
B179720.0%Average Performance
C218724.3%Below Average Achievement
D288732.1%Poor Pursuance
F7067.8%Failed
Key Insight: The dataset is imbalanced — Grade D is the most common (32.1%), while A+ is extremely rare (0.5%). This class imbalance affects model performance, especially for minority classes.

📊 2. Exploratory Data Analysis

Interpretation: The grade distribution is roughly bell-shaped but skewed toward lower grades. D is the most frequent grade, suggesting many students struggle across subjects. The rare A+ class (49 students) will be hardest for models to predict.
Interpretation: All three subjects show approximately uniform distributions across the 10–100 range, with means around 53–56. No subject appears inherently harder or easier.
Interpretation: Math, Physics, and Chemistry scores show near-zero correlation with each other (~0.00), meaning student performance in one subject is independent of others. This is an interesting finding — performing well in Math doesn't predict Physics or Chemistry scores.

🤖 3. Classification Model Results (Grade Prediction)

We trained 13 classification models to predict student grades from their Math, Physics, and Chemistry scores plus engineered features.

#ModelAccuracyPrecisionRecallF1 Score
1 Decision Tree 1.0000 1.0000 1.0000 1.0000
2 Random Forest 1.0000 1.0000 1.0000 1.0000
3 Gradient Boosting 1.0000 1.0000 1.0000 1.0000
4 Gradient Boosting (Tuned) 1.0000 1.0000 1.0000 1.0000
5 Voting Ensemble 1.0000 1.0000 1.0000 1.0000
6 Stacking Ensemble 1.0000 1.0000 1.0000 1.0000
7 Random Forest (Tuned) 0.9994 0.9995 0.9994 0.9994
8 SVM (linear) 0.9917 0.9918 0.9917 0.9916
9 MLP Neural Network 0.9872 0.9873 0.9872 0.9872
10 SVM (rbf) 0.9739 0.9742 0.9739 0.9738
11 Logistic Regression 0.9711 0.9716 0.9711 0.9702
12 KNN (k=5) 0.9467 0.9469 0.9467 0.9466
13 Naive Bayes 0.9439 0.9458 0.9439 0.9442
🏆 Best Model: Decision Tree
Accuracy: 1.0000 | F1 Score: 1.0000

Tree-based ensemble methods (Random Forest, Gradient Boosting) typically perform best on this dataset because they can capture the non-linear decision boundaries between grade categories based on the combination of three independent score features.

🔢 4. Confusion Matrices

Interpretation: Confusion matrices show where models make mistakes. The diagonal represents correct predictions. Most errors occur between adjacent grades (e.g., B vs B+, C vs D), which is expected since these grades have overlapping score ranges. The rare A+ class is often misclassified due to limited training examples.

📈 5. ROC Curves (Pass/Fail Classification)

Interpretation: ROC curves show the tradeoff between true positive rate and false positive rate for binary Pass/Fail classification. All models achieve high AUC scores, indicating that distinguishing between passing and failing students is relatively straightforward based on score features. Models with AUC > 0.90 are considered excellent classifiers.

🎯 6. Feature Importance Analysis

Key Findings:
  • Total_Score and Average_Score are the most important features, as grades are primarily determined by the combined performance across all subjects.
  • Min_Score is also highly important — a very low score in any subject can significantly lower the overall grade.
  • Individual subject scores (Math, Physics, Chemistry) contribute roughly equally, confirming that no single subject dominates grade determination.
  • Score_Range and Score_Std capture the consistency of performance — students with high variance across subjects tend to receive different grades than consistently performing students.

🔄 7. Cross-Validation Results

Interpretation: Cross-validation provides a more reliable estimate of model performance by training and testing on different data splits. Low standard deviation in CV scores indicates stable, reliable models. Gradient Boosting and Random Forest typically show the best balance of high accuracy and low variance.

📚 8. Learning Curves Analysis

Interpretation:
  • If training and validation curves converge at a high score → model generalizes well
  • Large gap between training and validation → overfitting (model memorizes training data)
  • Both curves plateau at a low score → underfitting (model is too simple)
  • Random Forest may show signs of slight overfitting (high training score, lower validation)
  • Logistic Regression curves converge quickly, suggesting the model is simpler but stable

🔮 9. Clustering Results (Unsupervised Learning)

Interpretation: K-Means clustering reveals natural groupings in the data based on score patterns. The PCA visualization shows that grade labels roughly correspond to clusters in the reduced feature space, but there is significant overlap between adjacent grades. DBSCAN identifies the core dense regions and outlier students with unusual score combinations.

📈 10. Regression Model Results

ModelMAERMSE
Linear Regression 1.0000 0.0000 0.0000
Ridge Regression 1.0000 0.0000 0.0000
Lasso Regression 1.0000 0.0052 0.0064
Random Forest Regressor 0.9980 0.5199 0.6724
Gradient Boosting Regressor 0.9969 0.6497 0.8298
Interpretation: Linear Regression achieves perfect R² = 1.000 for predicting Total_Score from individual subject scores because Total_Score = Math + Physics + Chemistry (a perfect linear relationship). For Average_Score prediction, tree-based regressors capture non-linear patterns slightly better than linear models.

🎯 11. Key Conclusions & Recommendations

Summary of Findings:

  1. Best classification model: Decision Tree with 100.0% accuracy
  2. Subject scores are independent — Math, Physics, Chemistry show ~0 correlation
  3. Grade is determined by total/average score, not by any single subject
  4. Class imbalance affects prediction of rare grades (A+ and F)
  5. Ensemble methods (Voting, Stacking) provide robust predictions
  6. Feature engineering (Total, Average, Min, Max, Range, Std) significantly improves model performance

⚠️ Important Notes:

  • The 'Comment' column was dropped as it maps 1:1 to Grade (would cause data leakage)
  • All students are from the same school, so school-level variation cannot be analyzed
  • The grade boundaries appear to be based on total/average score thresholds
  • With only 3 input features, simpler models often perform comparably to complex ones